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While the long-ranged correlation of market orders and their impact on prices has been relatively 
well studied in the literature, the corresponding studies of limit orders and cancellations are scarce. 
We provide here an empirical study of the cross-correlation between all these different events, and 
their respective impact on future price changes. We define and extract from the data the "bare" 
impact these events would have, if they were to happen in isolation. For large tick stocks, we 
show that a model where the bare impact of all events is permanent and non-fluctuating is in good 
agreement with the data. For small tick stocks, however, bare impacts must contain a history 
dependent part, reflecting the internal fluctuations of the order book. We show that this effect can 
be accurately described by an autoregressive model on the past order flow. This framework allows 
us to decompose the impact of an event into three parts: an instantaneous jump component, the 
modification of the future rates of the different events, and the modification of the jump sizes of 
future events. We compare in detail the present formalism with the temporary impact model that 
was proposed earlier to describe the impact of market orders when other types of events are not 
observed. Finally, we extend the model to describe the dynamics of the bid-ask spread. 
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1. INTRODUCTION 

The relation between order flow and price changes has attracted considerable attention in the recent years [iHl] ■ To 
the investors' dismay, trades on average impact the price in the direction of their transactions, i.e. buys push the price 
up and sells drive the price down. Although this sounds very intuitive, a little reflection shows that such a statement 
is far from trivial, for any buy trade in fact meets a sell trade, and vice-versa! On the other hand, there must indeed 
be a mechanism allowing information to be included into and reflected by prices. This is well illustrated by the Kyle 
model [3], where the trading of an insider progressively reveals his information by impacting the price. Traditionally, 
the above "one sell for one buy" paradox is resolved by arguing that there are in fact two types of traders coexisting 
in the ecology of financial markets: (i) "informed" traders who place market orders for immediate execution, at the 
cost of paying half the bid-ask spread, and (ii) uninformed (or less informed) market makers who provide liquidity 
by placing limit orders on both sides of the order book, hoping to earn part of the bid-ask spread. In this setting, 
there is indeed an asymmetry between a buyer, placing a market order at the ask. and the corresponding seller with 
a limit order at the ask, and one can speak about a well defined impact of buy/sell (market) orders. The impact of 
market orders has therefore been empirically studied in great detail since the early nineties. As reviewed below, many 
surprising results have been obtained, such as a very weak dependence of impact on the volume of the market order, 
the long-range nature of the sign of the trades, and the resulting non-permanent, power-law decay of impact. 

The conceptual problem is that the distinction between informed trader and market maker is no longer obvious in 
the present electronic markets, where each participant can place both limit and market orders, depending on his own 
strategies, the current state of the order book, etc. Although there is still an asymmetry between a buy market order 
and a sell limit order that enables one to define the direction of the trade, "informed" traders too may choose to place 
limit orders, aiming to decrease execution costs. Limit orders must therefore also have an impact: adding a buy limit 
order induces extra upwards pressure, and cancelling a buy limit order decreases this pressure. Surprisingly, there are 
very few quantitative studies of the impact of these orders - partly due to the fact that more detailed data, beyond 
trades and quotes, is often needed to conduct such studies. As this paper was under review, we became aware of ref. 
Q, where a similar empirical study of the impact of limit orders is undertaken. 

The aim of the present paper is to provide a unified framework for the description of the impact of all order book 
events, at least at the best limits: market orders, limit orders and cancellations. We study the correlations between 
all events types and signs. Assuming an additive model of impact, we map out from empirical data (consisting purely 
of trades and quotes information) the average individual impact of these orders. We find that the impact of limit 
orders is similar (albeit somewhat smaller) to that of market orders. 

We then compare these results to a simple model which assumes that all impacts are permanent in time. This 
works well for large tick stocks, for which the bid-ask spread is nearly constant, with no gaps in the order book. The 
discrepancies between this simple model and data from small tick stocks are then scrutinized in detail and attributed 
to the history dependence of the impact, which we are able to model successfully using a linear regression of the gaps 
on the past order flow. Our flnal model is specified in Sec. [71 Eq. pO[l . This framework allows us to measure more 
accurately the average impact of all types of orders, and to assess precisely the importance of impact fluctuations due 
to changes in the gaps behind the best quotes. 

We want to insist on the fact that our study is mostly empirical and phenomenological, in the sense that we aim at 
establishing some stylized facts and building a parsimonious mathematical model to describe them without referring 
to any precise economic reasoning about the nature and motivations of the agents who place the orders. For recent 
papers along this latter direction, see e.g. [1, • 

The outline of this paper is as follows. We first review (Sec. [5]) the relevant results on the impact of market 
orders and set the mathematical framework within which we will analyze our order book event data. We explain in 
particular why the market order impact function measured in previous studies is in fact "dressed" by the impact of 
other events (limit orders, cancellations), and by the history dependence of the impact. We also relate our formalism 
to Hasbrouck's Vector Autoregression framework. We then turn to the presentation of the data we have analyzed (Sec. 
[3]), and of the various correlation functions that one can measure (Sec. |4]). From these we determine the individual (or 
"bare"), lag-dependent impact functions of the different events occuring at the bid price or at the ask price (Sec. O. 
We introduce a simplified model where these impact functions are constant in time, and show that this gives an good 
approximate account of our data for large tick stocks, while significant discrepancies appear for small tick stocks (Sec. 
[6]). The systematic differences are explained by the dynamics of order flow deeper in the book, which can be modeled 
as a history dependent correction to the linear impact model (Sec. [71 see Eq. (jSU)) ). Our results are summarized in 
the conclusion, with open issues that would deserve more detailed investigation. In the Appendices we also show how 
the bid-ask spread dynamics can be accounted for within the framework introduced in the main text (Appendix [^ 
and some supplementary information concerning the different empirical correlations that can be measured (Appendix 

HI). 
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2. IMPACT OF MARKET ORDERS: A SHORT REVIEW 



2.1. The transient impact model 



Quantitative studies of the price impact of market orders have by now firmly estabhshed a number of styhzed facts, 
some of which appear rather surprising at first sight. The sahent points are (for a recent review and references, see 

S): 

• Buy (seh) trades on average impact the price up (down). In other words, there is a strong correlation between 
price returns over a given time interval and the market order imbalance on the same interval. 

• The impact curve as a function of the volume of the trade is strongly concave. In other words, large volumes 
impact the price only marginally more than small volumes. 

• The sign of market orders is strongly autocorrelated in time. Despite this, the dynamics of the midpoint is very 
close to purely diffusive. 

A simple model encapsulating these empirical facts assumes that the mid-point price pt can be written at (trade) time 
i as a linear superposition of the impact of past trades 0,01:^ 

Pt^Y. [Git-t')et'vt+nt>] +P.OO, (1) 
t'<t 

where Vt' is the volume of the trade at time t' , et' the sign of that trade (+ for a buy, — for a sell), and rit is an 
independent noise term that models any price change not induced by trades (e.g. jumps due to news). The exponent 6 
is small; the dependence in v might in fact be logarithmic (6* ^> 0). The most important object in the above equation 
is the function G{t — t') which describes the temporal evolution of the impact of a single trade, which can be called 
a 'propagator': how does the impact of the trade at time t' < t propa gate , on average, up to time tl We discuss in 
section 2.4 below how Eq. ([T]) is related to Hasbrouck's VAR model [H Il2| 

An important result, derived in is that G{t — t') must decay with time in a very specific way, such as to off-set 
the autocorrelation of the trades, and maintain the (statistical) efficiency of prices. Clearly, if G{t — t') did not 
decay at all, the returns would simply be proportional to the sign of the trades, and therefore would themselves be 
strongly autocorrelated in time. The resulting price dynamics would then be highly predictable, which is not the case. 
Conversely, if G{t — t') decayed to zero immediately, the price as given by Eq. ([1]) would oscillate within a limited 
range, and the long-term volatility would be zero. The result of 0] is that if the correlation of signs C{t) = {etet+t) 
decays at large d as with 7 < 1 (as found empirically), then G{t — t') must decay as \t — t'\~^ with /? = (1 — 7)/2 
for the price to be exactly diffusive at long times. The impact of single trades is therefore predicted to decay as a 
power-law (at least up to a certain time scale), at variance with simple models that assume that the impact decays 
exponentially to a non-zero "permanent" value. More generally, one can use the empirically observable impact function 
??.(£), defined as: 

n{i)^{{pt+,~pt)-it) (2) 

and the time correlation function C(£) of the variable = etuf to map out, numerically, the complete shape of 
G{t — t'). This was done in using the exact relation: 

7^(£)= G{n)C{i-n) + ^G{n)C{n~- i)-^G{n)G{n). (3) 

0<n<f n>t. n>0 

This analysis is repeated in a more general setting below (see Sec. [5] and Eq. (|16p ). The above model, however, is 
approximate and incomplete in two, interrelated ways. 



In the following, we only focus on price changes over small periods of time, so that the following additive model is adequate. For longer 
time scales, one should worry about multiplicative effects, which in this formalism would naturally arise from the fact that the bid-ask 
spread, and the gaps in the order book, are a fraction of the price. Therefore, the impact itself, 5, is expected to be proportional to a 
moving average of the price. See llll for a discussion of this point. 
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• First, Eq. ([T]) neglects the fluctuations of the impact: one expects that 0{t' t), which is the impact of trade 
at some time t' measured until a later time t, to depend both on t and t' and not only on t — t' . Its formal 
definition is given by: 

Q{t'^t) = ^, ^t^e.vt. (4) 

Impact can indeed be quite different depending on the state of the order book and the market conditions 
at t' . As a consequence, if one blindly uses Eq. ([1]) to compute the second moment of the price difference, 
D{t) = {{pt+i —Pt)'^), with a non-fluctuating G{£) calibrated to reproduce the impact function Tl{£), the result 
clearly underestimates the empirical price variance: see Fig. [TJ Adding a diffusive noise rit would only shift 
D{£)/£ upwards, but this is clearly insufficient to reproduce the empirical data. 

• Second, other events of the order book can also change the mid-price, such as limit orders placed inside the 
bid-ask spread, or cancellations of all the volume at the bid or the ask. These events do indeed contribute to 
the price volatility and should be explicitly included in the description. A simplified description of price changes 
in terms of market orders only attempts to describe other events of the order book in an effective way, through 
the non-trivial time dependence of G{£). 




Figure 1: D{£)/£ and its approximation with the temporary impact model with only trades as events, with nt = and for 
small tick stocks. Results are shown when assuming that all trades have the same, non fluctuating, impact G{£), calibrated 
to reproduce TZ{£). This simple model accounts for ~ 2/3 of the long term volatility. Other events and/or the fluctuations of 
impact G{£) must therefore contribute to the market volatility as well. 



2.2. History dependence of the impact function 



Let us make the above statements more transparent on toy-models. First, the assumption of a stationary impact 
function Q{t' t) = G{t — t') is clearly an approximation. The past order flow (< t') should affect the way the trade 
at time t' impacts the price, or, as argued by Lillo and Farmer, that liquidity may be history dependent [gI [isl IT^. 
Suppose for simplicity that the variable = ^tVt is Gaussian (which turns out to be a good approximation) and that 
its impact is permanent but history dependent. If we assume that the past order flow has a small influence on the 
impact, we can formally expand Q in powers of all past ^'s to get: 



(5) 



ti<t' 



ti,t-2<t' 



Using the fact the ^'s are Gaussian with zero mean, one finds that the impact function TZ{£) within this toy-model is 
given by: 



0<?i<£ 



Go + G 



ni ,7i2 >0 



g2{ni;n2)C{ni - 712) 



0<n<l rii ,n2>0 



g2{ni]n2)G{ni)C{n-n2). (6) 



If one compares this expression with Eq. ([3]) to extract an effective propagator G{£), it is clear that the resulting 
solution will have some non-trivial time dependence induced by the third term, proportional to G2 ■ Note that within 
this toy-model, Gi does not contribute to the TZ{£) but contributes to the volatility D{£). 
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2.3. The role of hidden events 



Imagine now that two types of events are important for the dynamics of the price. Events of the first type are 
characterized by a random variable (e.g., = e^w^ in the above example), whereas events of the second type (say 
limit orders) are charaterizcd by another random variable ryt . The "full" dynamical equation for the price is given by: 



Pt 



= ^Gi(i-i')6'+E^2(t-t')^t 



P~ 



(7) 



t'<t 



t'<t 



Imagine, however, that events of the second type are not observed. If for simplicity ^ and r^'s are correlated Gaussian 
random variables, one can always express the 77's as linear superposition of past ^'s and find a model in terms of ^'s 
only, plus an uncorrelated 'noise' component nt coming from the unobserved events: 



t'<t 



Gi{t - t')it, +G2{t- 1') J2 2(t' - n^t 



(8) 



S is the linear filter allowing to predict the 77's in terms of the past ^'s. It can be expressed in a standard way in 
terms of the correlation function of the r/'s and the cross-correlation between ^'s and 77's. Notice that the previous 
equation can be recast in the form of Eq. ([l} plus noise, with an effective propagator "dressed" by the influence of 
the unobserved events: 



G(^) = Gi(£)+ G2{l')E{£^l'). 



(9) 



o<i'<e 



From this equation, it is clear that a non-trivial dependence of G can arise even if the 'true' propagators Gi and G2 
are time independent - in other words the decay of the impact of a single market order is in fact a consequence of the 
interplay of market and limit order flow. As a trivial example, suppose both bare propagators are equal and constant 
in time (Gi(£) = G2{£) = G) and r]t = —61 Vt. This means that the two types of events impact the price but exactly 
cancel each other. Then, S(f) = and G{t) = 0, as it should: the dressed impact of events of the first type is zero. 
This is an idealized version of the asymmetric liquidity model of Lillo and Farmer mentioned above [l3j . 

The aim of this paper is to investigate a model of impact similar to Eqs. ([T|) and ([7|), but where a wider class 
of order book events are explicitly taken into account. This will allow us to extract the corresponding single event 
impact functions, and study their time evolution. As a test for the completeness and accuracy of the model, the time 
behavior of other observables, such as the second moment of the price difference should be correctly accounted for. 
We start by presenting the data and extra notations which will be useful in the sequel. We then discuss the different 
correlation and response functions that can be measured on the data. 



2.4. Relation with Hasbrouck's VAR model 



At this stage, it is interesting to relate the above 'propagator' framework encoded in Eq. ([Ij and the econometric 
Vector Autoregressive (VAR) model proposed by Hasbrouck, and that became a standard in the microstructure 
literature. In its original formulation, the VAR model is a joint linear regression of the present price return rt and 
signed volume Xt = etVt onto their past realisations, or more precisely: 

t')xt' + nr{t), 

t')xr+n,{t), (10) 

where n^^x are i.i.d. noises and the B{t — t') are regression coefficients, to be determined. Eq. ([T]) can be seen 
as a special case of the VAR model, Eq. PH)) . provided the following identifications/modifications are made: a) 

— ?> = f-tVt': b) the coefficients Brr{i) are assumed to be zero; c) since Eq. ([T]) models prices and not returns, one 
has G(£) = '^o<er^iBxrii')- G(oo) is called the information content of a trade in Hasbrouck's framework; d) finally, 
although the autocorrelation C{£) of the is measured, the dynamical model for the is left unspecified. 

Although the two models are very similar at the formal level, the major distinction lies in the interpretation, 
which in fact illustrates the difference between econometric models and structural models. Whereas the VAR model 
postulates a general, noisy linear relation between two sets (or more) of variables and determines the coefficients via 



rt=22 ^rr{t - t')rt' + Bxr{0)xt + 22 B^rit " 

t'<t t'<t 

Xt = Y Brx{t - t')rt' + Bxxit - 
t'<t t'<t 
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least squares, structural model insist on a microscopic mechanism that leads to an a priori structure of the model 
and an interpretation of the coefficients. Eq. ([l} is a causal model for impact, which postulates that the current 
price is a result of the impact of all past trades, plus some noise contribution nt that represents price moves not 
related to trades (for example, quote revisions after some news announcements). In this context, there is no natural 
interpretation for the Brr coefficients, which must be zero: past price changes cannot by themselves influence the 
present price, although these may of course affect the order flow ^t, which in turn impacts 'physically' the price. On 
the other hand, the interpretation in terms of impact allows one to anticipate the limitations of the model and to 
suggest possible improvements, by including more events or by allowing for some history dependence, as discussed in 
the above subsections. 

The aim of the present paper is to justify fully this structural modeling strategy by accounting for all events in the 
order book. In this case, the variation of the price can be tautologically decomposed in terms of these events, and 
the corresponding regression coefficients have a transparent interpretation. Furthermore, the limitations of a purely 
linear model appear very clearly as the history dependence of impact may induce explicit non-linearities (see Sec. [71). 

3. DATA AND NOTATIONS 

In this paper we analyze data on 14 of the most liquid stocks traded at NASDAQ during the period 03/03/2008 - 
19/05/2008, a total of 53 trading days. The particular choice of market is not very important, many of our results 
were also verified on other markets (such as CME Futures, US Treasury Bonds and stocks traded at London Stock 
Exchange'^), as well as on other time periods and they appear fairly robust. 

We only consider the usual trading time between 9:30-16:00, all other periods are discarded. We will always use 
ticks (0.01 US dollars) as the units of price. We will use the name "event" for any change that modifies the bid or ask 
price, or the volume quoted at these prices. Events deeper in the order book are unobserved and will not be described: 
although they do not have an immediate effect on the best quotes, our description will still be incomplete; in line with 
the previous section, we know that these unobserved events may "dress" the impact of the observed events. 

Events will be used as the unit of time. This "event time" is similar, but more detailed than the notion of transaction 
time used in many recent papers. Since the dependence of impact on the volume of the trades is weak @,[l^, we have 
chosen to classify events not according to their volume but according to whether they change the mid-point or not. 
This strong dichotomy is another approximation to keep in mind. It leads to six possible types of events'^: 

• market orders^ that do not change the best price (noted MO") or that do (noted MO'), 

• limit orders at the current bid or ask (LO") or inside the bid-ask spread so that they change the price (LO'), 

• and cancellations at the bid or ask that do not remove all the volume quoted there (CA°) or that do (CA'). 

The upper index ' ("prime") will thus denote that the event changed any of the best prices, and the upper index that 
it did not. Abbreviations without the upper index (MO, CA, LO) refer to both the price changing and the non-price 
changing event type. The type of the event occuring at time t will be denoted by ttj . 

Our sample of stocks can be divided into two groups: large tick and small tick stocks. Large tick stocks are such 
that the bid-ask spread is almost always equal to one tick, whereas small tick stocks have spreads that are typically 
a few ticks. The behavior of the two groups is quite different, and this will be emphasized throughout the paper. For 
example, the events which change the best price have a relatively low probability for large tick stocks (about 3% 
altogether), but not for small tick stocks (up to 40%). Table HIl shows a summary of stocks, and some basic statistics. 
Note that there is a number of stocks with intermediate tick sizes, which to some extent possess the characteristics of 
both groups. Technically, they can be treated in exactly the same way as small tick stocks, and all our results remain 
valid. However, for the clarity of presentation, we will not consider them explicitly in this paper. 

Every event is given a sign et according to its expected long-term effect on the price. For market orders this 
corresponds to usual order signs, i.e., et = 1 for buy market orders (at the ask price) and —1 for sell market orders (at 



^ The results for these markets are not reprodueed here, for lack of space, but the corresponding data is available on request. 

^ Our data also included a small number (fa 0.3%) of marketable (or crossing) limit orders. In principle these could have been treated as 

a market order (and a consequent limit order for the remaining volume if there was any). Due to technical limitations we decided to 

instead remove these events and the related price changes. 
* To identify multiple trades that are initiated by the same market order, we consider as one market order all the trades in a given stock 

that occur on the same side of the book within a millisecond. Such a time resolution is sufficient for distinguishing trades initiated by 

different parties even at times of very intense trading activity. 
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TT 


Gv6iit dGfinitioii 


event sign definition 


gap definition (^tt e) 


TT = MO" 


market order, volume < outstand- 
ing volume at the best 


e = ±1 for buy /sell 
market orders 





TT = MO' 


market order, volume > outstand- 
ing volume at the best 


e = ±1 for buy /sell 
market orders 


half of first gap behind the ask (e = 1) 
or bid (c = — 1) 


^ = CA" 


partial cancellation of the bid/ask 
queue 


e = =pl for buy /sell 
side cancellation 





^ = L0" 


limit order at the current best 
bid/ask 


e = ±1 for buy /sell 
limit orders 





TT = CA' 


complete cancellation of the best 
bid / ask 


e = =pl for buy /sell 
side cancellation 


half of first gap behind the ask (e = 1) 
or bid (e = —1) 


TT = LO' 


limit order inside the spread 


e = ±1 for buy /sell 
limit order 


half distance of limit order from the 
earlier best quote on the same side 



Table I: Summary of the 6 possible event types, the corresponding definitions of the event signs and gaps. 



the bid price). Cancelled sell limit orders and incoming buy limit orders both have et = 1, while others have et = — 1. 
The above definitions are summarized in Table H] Note that the table also defines the gaps A^^^, which will be used 
later. 

It will also be useful to define another sign variable corresponding to the "side" of the event at time t, which will 
be denoted by St- It indicates whether the event t took place at the bid (st = ~1) or the ask (st = 1), thus: 

_ / et if TTt = M0°, AIO', CA° or CA' , . 

'*"\ -et if TTt - LO" or LO' ^^^> 

The difference between e and s is because limit orders correspond to the addition not the removal of volume, and 
thus they push prices away from the side of the book where they occur. 

In the following calculations we will sometimes rely on indicator variables denoted as /(TTt = tt). This expression is 
1 if the event at t is of type tt and zero otherwise. In other words, /(TTt = tt) = ^tt^.tt, where S is the Kroneckcr-dclta. 
We will also use the notation (•) to denote the time average of the quantity between the brackets. For example, the 
unconditional probability of the event type tt can be, by definition, calculated as P{n) = {I^nt = n)). 

The indicator notation, although sometimes heavy, simplifies the formal calculation of some conditional expectations. 
For example if a quantity X^ t depends on the event type tt and the time t, then its conditional expectation at times 
of TT-type events is 

,y I ^ {X^.tHnt = tt)) 
{X^,,t\nt =T^) = ■ 

Also, by definition 

^/(^t=^) = l; and ^X,,t/(^t =X.,.t. (12) 



4. CORRELATION AND RESPONSE FUNCTIONS 



In this section, we study the empirical temporal correlation of the different events defined above, and the response 
fmiction to these events. 



4.1. The autocorrelation of e and s 



We first investigate the autocorrelation function of the event signs, calculated as (et+£ • et). These are found to be 
short-ranged, see Fig. [2l where the correlation function dies out after 10-100 trades, corresponding to typically 10 
seconds in real time. This is in contrast with several other papers 0, H, [3, where e's are calculated for market 
orders only (tt — MO", MO'), and those signs are known to be strongly persistent among themselves, with, as recalled 
in Sec. [H a correlation decaying as a slow power law. However, the direction of incoming limit orders is negatively 
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ticker 


P(MO°) 


P(MO') 


P(CA<') 


P(LO°) 


P(CA') 


P(LO') 


mean spread 
(ticks) 


mean price 
(USD) 


time/event 
(sec) 




AMAT 


0.042 


0.011 


0.39 


0.54 


0.0018 


0.013 


1.11 


17.45 


0.16 


; tick 


CMCSA 


0.040 


0.0065 


0.41 


0.53 


0.0021 


0.0087 


1.12 


20.29 


0.15 


CSCO 


0.051 


0.0085 


0.40 


0.53 


0.0010 


0.0096 


1.08 


67.77 


0.10 


bO 

M 


DELL 


0.042 


0.0087 


0.40 


0.54 


0.0019 


0.011 


1.10 


20.22 


0.17 




INTO 


0.052 


0.0073 


0.40 


0.54 


0.00080 


0.0081 


1.08 


19.43 


0.12 




MSFT 


0.054 


0.0087 


0.40 


0.53 


0.0012 


0.010 


1.09 


27.52 


0.098 




ORCL 


0.050 


0.0090 


0.40 


0.54 


0.0012 


0.010 


1.09 


20.86 


0.16 




AAPL 


0.043 


0.076 


0.32 


0.33 


0.077 


0.16 


3.35 


140.56 


0.068 




AMZN 


0.038 


0.077 


0.26 


0.31 


0.12 


0.20 


3.70 


70.68 


0.21 


small 1 


APOL 


0.042 


0.080 


0.24 


0.33 


0.11 


0.20 


3.78 


55.24 


0.40 


COST 


0.054 


0.069 


0.27 


0.36 


0.082 


0.16 


2.62 


67.77 


0.39 




ESRX 


0.042 


0.074 


0.24 


0.32 


0.12 


0.20 


4.12 


60.00 


0.63 




GILD 


0.052 


0.043 


0.34 


0.46 


0.032 


0.077 


1.64 


48.23 


0.23 



Table IL Summary statistics for all stocks, showing the probability of the different events, the mean spread in ticks, the mean 
price in dollars and the average time between events in seconds. The last column shows the total number of events in the 
sample. 



correlated with cancellations and market orders. Because the e time series contains all types of events, the mixture 
of long-range positive and negative correlations balances such that only short-range persistence remains. Any other 
result would be incompatible with little predictability in price returns. As illustrated by the toy example of Sec. [5J 
Eq. (|9]), this mixing process in fact maintains statistical market efficiency, i.e. weak autocorrelation of price changes. 

When limit orders and cancellations are included, one can independently analyze the persistence of the side St 
of the events. According to Eq. this means flipping the event signs of limit orders in the e time series, while 

keeping the rest unchanged. This change reverses the compensation mechanism discussed above, and s is found to 
have long-range correlations in time: (st+£ • St) is shown in Fig. [2] and decays as £~'^ with 7 » 0.7. This long range 
decay is akin to the long range persistence of market order signs discussed throughout the literature: since market 
orders tend to persistently hit one side of the book, one expects more limit orders and cancellations on the same side 
as well. Intuitively, if a large player splits his order and buys or sells using market orders for a long period of time, 
this will attract compensating limit orders on the same side of the book. 



e, large tick 
e, small tick 
s, large tick 

s, small tick 

-0.7 




10" 10' 10' 10'" 10" 10-" 10" 
I (events) 



Figure 2: {et+e ■ h) and {st+i ■ St), averaged for large and small tick stocks. 



4.2. The signed event-event correlation functions 



We will see in the following, that for describing price impact the most important correlation functions are those 
defined between two (not necessarily different) signed event types. For some fixed tti and 712 one can define the 
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I (events) I (events) 

Figure 3: The normalized, signed event correlation functions Ciri.irg (^), (l-sft) tti = MO", (right) tti = MO'. The curves are 
labeled by their respective 7r2's in the legend. The bottom panels show the negative values. 



normalized correlation between these signed events as: 

^ _ {Ij-^t = 7ri)et/(7rt+f = TT2)e-t+i) 

Our convention is that the first index corresponds to the first event in chronological order. Because we have 6 
event types, altogether there are 6^ = 36 of these event-event correlation functions. There are no clearly apparent, 
systematic differences between large and small tick stocks, hence we give results averaged over both groups in Fig. [3] 
for TTi = MO" and tti = MO'. (Other correlation functions are plotted in Appendix IbI) Trades among themselves and 
regardless of group are long range correlated as it is well known and was recalled above, and confirmed again in Fig. 
131 For other cases, the sign of the correlations between event types varies and in many cases one observes a similarly 
slow decay that can be fitted by a power law with an exponent around 0.5. Furthermore, there are two distinctly 
different regimes. For £ < 100 events (which means up to 10 — 20 seconds in real time) returns are still autocorrelated 
(cf. Fig. [2]). In this regime CMo'',7r2(^) is positive for any event type tt2, so small trades are followed by a ballistic 
move in the same direction by other trades and also by limit orders, while at the same time cancellations also push 
the price in the same direction. C'mo' ,TT2i^) ^^^o positive except for LO , where it is negative except for very small 
lags^. This means that if a market order removes a level, it is followed by further trades and cancellations in the 
same direction, but the level is refilled very quickly by incoming limit orders inside the spread. For longer times some 
correlation functions change sign. For ex amp le in Fig. [3Jleft) one can see this reversal for limit orders. Market orders 
"attract" limit orders, as noted in 0, [13, Il7l |. This "stimulated refill" process ensures a form of dynamic equilibrium: 
the correlated fiow of market orders is offset by an excess inflow of opposing limit orders, such as to maintain the 
diffusive nature of the price. This is the same process causing the long-range correlations of st noted above. 

In general, there are no reasons to expect time reversal symmetry, which would impose 6*^1.^2 (^) = C',r2,-n-i (^)- 
However, some pairs of events appear to obey this symmetry at least approximately, for example MO" and CA*' or 
MO' and CA', see Fig. 01 On the other hand, for the pair MO', LO' one can see that limit orders that move the price 
are immediately followed by opposing market orders. The dual compensation, i.e. a stimulated refill of liquidity after 
a price moving market order MO', only happens with some delay. MO" and limit orders also lead to some asymmetry, 



^ There is some sign of oscillations for small tick stocks. 
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see Fig. [5] here we see that after a transient, non-aggressive market orders induce compensating hmit orders more 
efficiently than the reverse process. 




I (events) 



Figure 4: Examples for time reversal symmetry for normalized, signed event correlations for small tick stocks, note that it is 
— C,ri,ir2(^) plotted. Lines and points of the same color correspond to the same event pairs. The curves are labeled by their 
respective tti's and tv^'s in the legend. 




I (events) 

Figure 5: Examples for time reversal asymmetry for normalized, signed event correlations for small tick stocks. Lines and 
points of the same color correspond to the same event pairs. The curves are labeled by their respective tti's and 7r2's in the 
legend. 



4.3. The unsigned event-event correlation functions 



A similar definition of a correlation function is possible purely between event occurences, without the signs: 

rr _ Pj'^t+e = t^iW = tti) _ (/(Tr-t ^ 7ri)/(7rt+£ = ttz)) ^ , , 

P(Tr2) P(7ri)F(7r2) 

where we have subtracted 1 such as to make the function decay to zero at large times. This quantity expresses the 
excess probability of 7r2-type events in comparison to their stationary probability, given that there was a TTi-type event 
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Figure 6: The normalized, unsigned event correlation functions n,ri,7r2 (^), (l-sft) tti = MO", (right) tti — MO'. The curves are 
labeled by their respective 7r2's in the legend. The bottom panels show the negative values. 



I lags earlier. Examples of this quantity for averages over all stocks are plotted in Fig. |6l One finds that generally 
n-n-i.TTaC^) decays slower when both tti and 7r2 move the price. This implies that events which change the best price 
are clustered in time: aggressive orders induce and reinforce each other. 



4.4. The response function 



Let us now turn to the response of the price to different types of orders. The average behavior of price after events 
of a particular type tt defines the corresponding response function (or average impact function): 

R^{^) ^ {{pt+t - Pt) ■ et\nt ^ t:) . (15) 

This is a correlation function between "sign times indicator" etli^^t = ^) a-t time t and the price change from t to t-\-i, 
normalized by the stationary probability of the event tt, denoted as P{t:) ~ {I{T^t ~ ^r)). This normalized response 
function gives the expected directional price change after an event tt. Its behavior for all tt's is shown in Fig. [7l We 
note that all type of events lead, on average, to a price change in the expected direction. Tautologically, i?7r(^ = 1) > 
for price changing events and R-k{^ = 1) = for other events. As the time lag £ increases, the impact of market orders 
grows significantly, specially for small tick stocks, whereas it remains roughly constant for limit orders/cancellations 
that do change the price. However, as emphasized in 0], the response function is hard to interpret intuitively, and 
in particular is not equal to the bare impact of an event since the correlations between events contribute to i?^(£), 
sec Eq. ^ above. We now attempt to deconvolute the effect of correlations and extract these bare impact functions 
from the data. 



5. THE TEMPORARY IMPACT MODEL 



Market orders move prices, but so do cancellations and limit orders. As reviewed in Sec. [2] above, one can try to 
describe the impact of all these events in an effective way in terms of a "dressed" propagator of market orders only, 
G(£), as defined by Eq. ([T|). Let us extend this formalism to include any number of events in the following way. We 
assume, that after a lag of t events, an event of type tt has a remaining impact G^(£). The price is then expressed as 
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Figure 7: The normalized response function Rtt^I) for (left) large tick stocks and (right) small tick stocks. The curves are 
labeled according to tt in the legend. 



the sum of the impacts of all past events, plus some initial reference price: 

ft = ^G,;(i-i')e*' +P-00, (16) 
t'<t 

where the term with the indicators selects exactly one propagator for each t', the one corresponding to the particular 
event type at that time. After straightforward calculations, the response function (|15|) can be expressed through Eq. 
(HH) and dn]) as 



0<n<£ n>£ n>0 



(17) 



This is a direct extension of Eq. which was obtained in Ref. jj). One can invert the system of equations in ([TT]). 
to evaluate the unobservable G^'s in terms of the observable i?^'s and G^^.^^'s. In order to do this, one rewrites the 
above in a matrix form, as 

oo 
7r2 n— 

where 

A-^l,'"2 u(^\) G7ri,7r2 (^ ^ "■) ^ GTTa.TTi (j^), if < n < £ < L , , 

[ G^2,^i(n -£)- 0*^2,^1 (n), ifO<£<n<L 

and oo was replaced by a large enough cutoff i, convenient for numerical purposes. In the following, we use L = 1000, 
which allows to determine the functions Gtt with a good precision up to ^ ^ 300, see Fig. [H 

As discussed in Sec. [21 the origin of the decay of market order price impact is that incoming limit orders maintain 
an equilibrium with market order flow. In order to keep prices diffusive, limit orders introduce a reverting force into 
prices, and this precisely off-sets the persistence in market order flow. However, our present extended formalism 
explicitly includes these limit orders (and also cancellations) as events. If all order book events were described, one 
naively expects that the Gtt's should be lag- independent constants for events that change the price, and zero otherwise. 
Solving the above equation for G^'s, however, leads to functions that still depend on the lag £, particularly for small 
tick stocks: see Fig. [8] We see in particular that market orders that do not change the price immediately do impact 
the price on longer time scales. We also notice that the impact of single MO',MO° events first grows with lag and 
then decays slowly. The impact of limit orders, although clearly measurable, seems to be significantly smaller than 
that of market orders, in particular for small tick stocks (see Q for a related discussion). 

In the rest of the paper, we will try to understand in more detail where the lag dependence of Gjr's comes from. The 
discussion of Sec. [2] already suggested that some history dependence of impact is responsible for this effect. Before 
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Figure 8: The bare propagators G,r(^) in the temporary impact model for (left) large tick stocks and (right) small tick stocks. 

dwelling into this, it is interesting to see how well the above augmented model predicts the volatility of the stocks once 
all the G^'s have been calibrated on the empirical JJ^'s. As just mentioned, Eq. (fT6|) neglects the fluctuations of the 
impact, and we therefore expect some discrepancies. In order to make such a comparison, we first express exactly the 
variance of the price at lag £, D{t) = {{pt+i — PtY) in terms of the G's and the Cs, generalizing the corresponding 
result obtained in Q: 

0<n<t TTi n>0 TTi 

0<ri<ri'<£ TTi ,TT2 
0<n<n' <e TTi ,TT2 

)] C'7r2,7ri("-' + n). (20) 

0<n<e n'>0 7ri,ir2 

The function D{£)/£, which should be constant for a strictly diffusive process, is plotted in Fig. |9l the symbols 
indicate the empirical data, and the dashed lines correspond to Eq. ([20]) . Note that we fit both models to each stock 
separately, compute D{£)/£ in each case, and then average the results. We see that the overall agreement is fair for 
small tick stocks, but very bad for large tick stocks. The reason will turn out to be that for large ticks, a permanent, 
non fluctuating impact model accounts very well for the dynamics. This reflects that the spread and the gaps behind 
the best quotes are nearly constant in that case. But any small variation of Gtt is amplified through the second term 
of Eq. (|20p which is an infinite sum of positive terms. Hence it is much better to work backwards and test a model 
where the single event propagator is assumed to be strictly constant over time, as we will explain in the next section. 

6. A CONSTANT IMPACT MODEL 

In the above section we found that the single event propagators G,r appear to have a non-trivial time dependence. 
Another way to test this result is to invert the logic and assume first that the G^ are time independent and see how 
well, or how badly, this assumption fares at accounting for the shape of the response functions Rj^ {£) and of the price 
diffusion D{£). 

Let us start from the following exact formula for the midpoint price: 

Pt+e=Pt+ Yj <^t'^^t', (21) 

t<t'<t+i 

Here Ajr^e^,,*' denotes the price change at time t' if an event of type tt happens. This A can also depend on the sign 
£('. For example, if tt = MO' and = —1 this means that at a sell market order executed the total volume at the 
bid. The midquote price change is — Amo'.-i,*'! which usually means that the second best level was bt' — 2AMO',-i,t'; 
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Figure 9: D{t)/l and its approximations for the two groups of stocks. For small tick stocks the values were divided by 10 for 
clarity. Symbols correspond to the empirical result. Dashed lines correspond to the temporary impact model with all 6 events 
and they are calculated from Eq. (|20p . The agreement is acceptable for small tick stocks, but very poor for large tick ones. 
Solid lines correspond to the constant impact model, see Eq. (|23p below; in this case the agreement with large tick stocks in 
nearly perfect. 



where bf is the bid price before the event. The faetor 2 is necessary, because the ask did not change, and the impact 
is defined by the change of the midquote. Hence Amo''s (and similarly Aca''s) correspond to half of the gap between 
the first and the second best quote just before the level was removed (see also Ref. [isj). Another example when 
TT = LO' and ec = —1. This means that at t' a sell limit order was placed inside the spread. The midquote price 
change is — Alq',-!,*'; which means that the limit order was placed at ac — 2ALO',-i,t': where Of is the ask price. 
Thus Alo''s correspond to half of the gap between the first and the second best quote right after the limit order was 
placed. In the following we will call the A's gaps. Note that the events M0°, CA" and L0° do not change the price, 
so their respective gaps are always zero: there are only three types of A's that are non-zero. 

The permanent impact model is defined by replacing the time dependent A's by their average values. More precisely, 
let us introduce the average realized gap: 

A« = (A^,,,,,i|7r, (22) 

The conditional expectation means that the gaps are sampled only when the price change corresponding to that 
particular kind of gap is truly realized. Therefore, in general AjJ ^ (A7r,et,t), see Table Hill where one sees that the 
realized gap when a market order moves the price is in fact larger than the unconditional average. The logic is that 
the opening of a large gap behind the ask is a motivation for buying rapidly (or cancelling rapidly for sellers) before 
the price moves up. 

Our approximate constant impact model then reads: 

Pt+i=Pt+ J2 (23) 
t<t'<t+e 

The response functions are then, by using Eq. (|12p . easily given by: 

R^i£) ^ {{pt+i - pt) ■ etlirt = n) = ^ ^ A«'^P(7ri)a,^,(t'), (24) 

0<t'<l TTl 

The formula (|24p is quite simple to interpret. We fixed that the event that happened at t was of type tt. Let us now 
express Ct^^tti (^) as: 

P{TT)P{TTi)Cyr^-^i{i) oc P{TTt+i = TTi, Et+i = et\nt = tt) - P{Trt+i = TTi,et+i = -etkt = tt). (25) 

This represents the following: Given that the event at t was of type tt and the event at i + ^ is of type tti, how 
much more is it probable, that the direction of the second event is the same as that of the first event? The total 
price response to some event can be understood as its own impact (lag zero), plus the sum of the biases in the course 
of future events, conditional to this initial event. These biases are multiplied by the average price change A^ that 
these induced future events cause. Of course, correlation does not mean causality, and we cannot a priori distinguish 
between events that are induced by the initial event, and those that merely follow the initial event (see [l9j for a 
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ticker 


2Amo' 


2AgA' 


2A^o' 


2(Amo') 




AMAT 


1.02 


1.04 


1.02 


1.00 


u-ge tick 


CMCSA 


1.03 


1.14 


1.06 


1.00 


csco 


1.01 


1.02 


1.01 


1.00 


DELL 


1.01 


1.05 


1.02 


1.00 




INTC 


1.00 


1.01 


1.01 


1.00 




MSFT 


1.01 


1.02 


1.01 


1.00 




ORCL 


1.01 


1.02 


1.02 


1.00 




AAPL 


1.31 


1.27 


1.27 


1.14 




AMZN 


1.51 


1.22 


1.30 


1.17 


small 1 


APOL 


1.76 


1.50 


1.52 


1.42 


COST 


1.35 


1.23 


1.24 


1.15 




ESRX 


1.85 


1.54 


1.60 


1.45 




GILD 


1.11 


1.13 


1.11 


1.03 



Table IIL Mean realized gaps and unconditional gaps in ticks for all stocks. All values were multiplied by 2, so that they 
correspond to the instantaneous change of the bid/ask and not of the midquote. Note that (Amo') ~ (Aca'), while (Alq') is 
not observable. 
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Figure 10: Comparison of true and approximated normalized response functions Rtt{£), using the constant gap model, for (left) 
large tick stocks and (right) small tick stocks, for events that do not change the price. Symbols correspond to the true value, 
and lines to the approximation. The data are labeled according to tt in the legend. 

related discussion). However, it seems reasonable to assume that there is a true causality chain between different 
types of events occuring on the same side of the book (i.e. a limit order refilling the best quote after a market order). 

Let us now take Eq. (j24p . and check how well the true response functions are described by the above constant 
impact model. Figs. [TOl and [TT] show that the agreement is nearly perfect for large tick stocks, except when tti = CA', 
but these events are very rare (less than ^ 0.2%). This agreement is expected because the order book is usually so 
dense that gaps hardly fluctuate at all; the small remaining discrepancies will in fact be cured below. The quality of 
the agreement suggests that the time dependence of the bare impact function G^r obtained in Sec. [S] above is partly 
a numerical artefact coming from the "brute force" inversion of Eq. ([TSl) . 

For small ticks on the other hand, noticeable deviations are observed as expected, and call for an extension of the 
model. This will be the focus of the next sections. One can extend the above model in yet another direction, by 
studying the dynamics of the spread rather than the dynamics of the mid-point, see Appendix \K\ 

One can approximate the volatility within the same model as 

Die)^{ip,+e~Ptf)^ E EE^(^i)^(^2)a,,..(t'-OA«,A^^. (26) 

As shown in Fig. [HI the constant gap model is very precise for large tick stocks (as again expected), but clear 
discrepancies are visible for small tick ones. 
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Figure 11: Comparison of true and approximated normalized response functions i?^(^), using the constant gap model, for (left) 
large tick stocks and (right) small tick stocks, for events that change the price. Symbols correspond to the true value, and lines 
to the approximation. The data are labeled according to tt in the legend. 



7. THE GAP DYNAMICS OF SMALL TICK STOCKS 
7.1. A linear model for gap fluctuations 



Let us now try to better understand how gap fluctuations contribute to the response function, and why replacing 
the gap by its average realized value is not a good approximation for small tick stocks. By definition, without the 
constant gap approximation, the response function contains contributions which have the form 



TTt = TTl , -Kt+t ~ 7r2 



After using some basic properties of the event signs this quantity can be written as a sum over three contributions: 
1. Firstly, there is the term from the constant gap approximation: 




This contains the highest order of the effect of event-event correlations. 

2. There is a second term that we write as: 

which is the conditional expectation value of the quantity (a). If (a) is positive, then after an upward price move 
consecutive upward moves arc larger than downward ones, while if (a) is negative then they are smaller. This 
process can thus either accelerate or dampen the growth of the response function. 

3. The third contribution is of the form 

\ 




Here (b) is positive, when the average of the two gaps (up and down) is greater than the time averaged realized 
value, (c) is positive, when the two events move the price in the same direction. Thus the full term gives a 
positive contribution to the response function, if two "parallel" events are correlated with larger gaps and hence 
decreased liquidity at the time of the second event, while opposing events correspond to increased liquidity at 
the time of the second event. The final effect of this term agrees with the previous one: If (5) x (c) is positive, 
then after an upward price move the consecutive upward moves become larger than downward ones and vice 
versa. 
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At this point we need a dynamical model for the A's, to quantify the above correlations, but we are faced with the 
difficulty that ^ is only observed for tt = ttj and e = e^. What we will do instead is to write a simple regression 
model directly for the observable quantity A^.ej^i/(7rt = 7r)et, that can be evaluated from data. Then based on this 
knowledge we will revisit the influence of gap fluctuations on the price dynamics in Sec. 17.31 

7.2. A linear model for gap fluctuations 

The correlation between events has a dynamical origin: market orders and cancellations attract replacement limit 
orders and vice versa. Eq. (j2ip is the exact time evolution of price written as a sum of the random variables 
A,r,et,t-^('''t We will postulate that both the realized gap A7r,et,t and the order flow /(ttj = 7r)et are influenced 

by the past order flow /(ttc = T^)£t', t' < t in a linear fashion, i.e.: 

A^,ej,t/(7rt = ir)et = X! X! -^Wi,7r(i - t')Ii'^t' = '^i)et' + ??7i-i,i, (27) 

t'<t TTi 

where all ij's are independent noise variables. Similarly, we write for the three price changing events MO',LO' and 
CA': 

t'<t TTi 

with other noise variables fj, and we introduced for later convenience. Note the above equations are again of the 
vector autoregression type, where the kernel K and K have a 3 x 6 matrix structure. 

Both models ^7} and (|28p can be calibrated to the data by using the same trick an in Sec. [SJ forming expectation 
values on both sides and solving a set of linear equations between correlation functions, for example for K: 

(A^_ej^^_t+f7(7rt+f = TT)et+iIiTTt = TTi)et) = X! X! + ^ ~t') {lij^f = '^2)<^t'I{'^t = 7ri)et) , (29) 

t'<t+l 1T2 

except this time we have three separate solutions for tt = MO', CA' and LO'. An example of the solution kernels K 
is given in Fig. [T^l the sign of these kernels is expected from what we learnt in Sec. |H We see for example that a MO 
event tends to make a future MO more probable, and with an increased gap, which makes sense. The same can be 
repeated with respect to Eq. to calculate the ^'s. 

0.08 

0.06 

In 0.04 
o 

^ 0.02 

^ -0.02 
-0.04 
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Figure 12: Estimates of AV,MO'(i) for small ticks. 

An important aspect of these VAR models is that once we have an estimate for their kernels, they can be used 
for forecasting the future price changes caused by each component of the event flow based on the events that oc- 
curcd in the recent past [l|. Eq. p7|) prescribes for us an estimate for conditional expectation values such as 
(AMO',et,t-^(^* = MO')et| • • ■), which is the expected price change due to a market order in the next event (times the 
probability of such an outcome), and the conditioning is on past signs and indicators. We can proceed similarly for 
CA' and LO', and finally the sum of the three components gives the expected price change in the next event. 

Such forecasts based on Eqs. ([27)) . ([251) perform surprisingly well in practice. Fig. [T^ shows that the expectation 
value of the left hand side of Ec^. (j27p is a monotonic function of our prediction, and the relationship on average can 
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be fitted with a straight line with slope 1, although small higher order (cubic) corrections seem to be present as well. 
Similar results can be found for Eq. (pS)) and K^s. 




-3-2-10123 
linear prediction of ,^|l{7t^^|=7c)e(^| (normalized) 



Figure 13: Performance of Eq. (|27p for small ticks. Both axes normalized by standard deviation of predictor. 



7.3. The final model for small ticks 



The above analysis suggests a way to build and calibrate an impact model that describes in a consistent way (a) 
all types of events and (b) the history dependence of the gaps, as we argued to be necessary in Sec. [2l The discussion 
of the previous section motivates the following model: 



Pt+e =Pt+ ^ 

t<t'<t+e 



t"<t' 



et', (30) 



where ftTra.Tn is a kernel that models the fluctuations of the gaps and their history dependence, which will be chosen 
such that the bare propagator of the model is given by Eq. ([^5)) above. 

The model specification, Eq. ((30)) . is the central result of this paper. It can be seen as a permanent impact model, 
but with some history dependence, modeled as a linear regression on past events. By symmetry, this dependence 
should only include terms containing ecCt" since the influence of any past string of events on the ask must be the 
same as that of the mirror image of the string on the bid. More generally, one may expect higher order, non-linear 
correction terms of the form 

'*7rtj,7rt2,7rt3;7rj,(i' - ^1,^' — ^2,^' - t3)etiet2%£*'j (31) 

tl,t2,t3<t' 

or with a larger (even) number of e's. We will not explore such corrections further here, although Fig. 1131 suggests 
these terms are present. 

Upon direct identification of Eq. (|30p with Eq. ([?!]) , and using (P7)) and one finds that k can be expressed in 
terms of K and K as: 

= K^,.,{i) - Kn.^Ai). (32) 

We can now compute the average response functions Rti{() and the diffusion curve D{£) within this model, and 
compare the results with empirical data. 

For the response functions, the addition of the fluctuating gap term in Eq. (PH]) corrects the small discrepancies 
found within the constant impact model for large tick stocks. It also allows one to capture very satisfactorily the 
response function for small tick stocks, see Figs. iMlandfTSl^ 



Note that in making these plots we neglected the first 30 and last 40 minutes of trading days, so they slightly differ from those in Sec. 
\E\ The results of the constant gap model are essentially unchanged regardless of such an exclusion. 
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Figure 14: Comparison of true and approximated normalized response functions Rtt{£) of the final model for (left) large tick 
stocks and (right) small tick stocks, for events that do not change the price. Symbols correspond to the true value, and lines to 
the approximation. The inaccuracy for large £ is due to a finite size effect in matrix inversion. The data are labeled according 
to TT in the legend. 
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Figure 15: Comparison of true and approximated normalized response functions Rtt{£) of the final model for (left) large tick 
stocks and (right) small tick stocks, for events that change the price. The inaccuracy for large £ is due to a finite size effect in 
matrix inversion. Symbols correspond to the true value, and lines to the approximation. The data are labeled according to n 
in the legend. 



A much more stringent test of the model is to check the behaviour of the diffusion curve D{£). The exact calculation 
in fact involves three and four-point correlation functions, for which we have no model. A closure scheme where these 
higher correlation functions are assumed to factorize yields the following approximation: 

a<t',t"<£ TTi 7r2 

2 E E E(^-|^l)«,-3(^'0^-2,^3(i + rm^2)P(7r3) + 

—£<t<t 7r2,7r3 T>0 

E E E (^-|*l)<t-4(^'^''*)^-..-4(r-r' + i)P(^2)P(7r4), (33) 



where 



-l<t<i 7r2,7r4 T,r'>0 



<,,3(t,0 = E'^-^.-iM[^(* = = ^3) ^ 0)Pini)+Iit = -T)P(7ri)n,,,,(T)], (34) 
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and, for t > 0, 

I{t ^ T')P(7ri)P(^3)[n.,,.3 W + 1]}, (35) 

whereas for t < 0, we use (r, r', — = (t', r, i). Direct numerical simulation of Eq. ([501 confirms that 

our approximation yields curves which are indistinguishable from those of the true model. As Fig. 1161 shows, 
for small tick stocks this approximation indeed shows some improvement for large i when compared to the constant 
gap model. For small £ there is still the same discrepancy as earlier coming from errors in the data that adds some 
spurious high frequency white noise. To account for these, we add an effective, lag-independent constant to D{£), 
whose value was chosen as Do = 0.04 ticks squared. According to Fig. [ini this substantially improves the fit for short 
times, while leaving the long time contribution unaffected. 

The conclusion is that our history dependent impact model reproduces the empirical average response function in 
a rather accurate way, and also improves the estimation of the diffusion curve. The discrepancies are expected, since 
we have neglected several effects, including (i) all volume dependence, (ii) unobserved events deeper in the book and 
(iii) higher order, non-linear contributions to model history dependence. 




Figure 16: D{£)/£ and its approximations for small tick stocks. Symbols correspond to the true result excluding the beginning 
and the end of trading days, the red line corresponds to the permanent impact model with constant gaps, the green line to 
the fluctuating gap model (both analytically and by simulation), and the blue line to the fluctuating gap model plus constant. 
Note: Unlike the small tick data in Fig. [9l the vertical axis was not rescaled here. 



7.4. Interpretation: direct impact vs. induced impact 

The relevance of the history kernels K and K can be understood through a simple argument. The sequence of 
events is characterized by the time series {ivt, et} and together with the gaps this series defines the course of the price. 
How will the event tt at time t affect the price at some later time t + £7 This quantity, denoted by G* {£) , is defined 
as the average of the formal total derivative of the price with respect to the past order flow: 

G;W = ^^), i:^I(,7:t=n)et. (36) 

It contains two distinct contributions: 

• A direct one: the immediate price change caused by the event, which is constant in time (zero or non-zero 
depending on the value of the corresponding gap). For example, if right now a large buy market order is 
submitted, it will cause an immediate upward jump in the (ask) price. The average of such jumps due to events 
of type TT is represented by the mean realized gap A^. 

• An induced, dynamic one: the change of the future event rates and their associated gaps. This modeled by Eq. 
(j27p . and quantified by the kernels K. To continue the above example of a large market order, it removes the 
best ask level, and hence we move into a denser part of the order book. The new first gap behind the ask is 
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on average smaller here. So in effect, our initial event makes the ask gap shrink. In addition, some time after 
we submit our order, additional sell limit orders will arrive to compensate part of our upwards price pressure. 
These may move the (ask) price back downwards. If we had decided not to submit our market order, these extra 
limit orders would not arrive either. 



To put this decomposition into precise quantitative terms, recall that as an exact identity. 



t<t'<t+i TTi 

The average derivative of the price with respect to an earlier event is therefore: 



(37) 



E 

t<t'<t+l TTi 



E 



E 



' E E 

t<t'<t+£ TTi 



(38) 



where we have introduced the formal definitions of the kernels K and K, in a way compatible with the linear model 
specified by Eqs. (|27p and ([25)1 . Therefore, the average price change until time t + £ attributed to an event of type tt 
at time t is found to be: 

g;(^) = a^+ ^ ^A^.,.,(0- (39) 

o<t'<e 

Numerically, the G*'s are given in Fig. I17f left) for small tick stocks, very similar curves were found for large ticks 
but we will not detail those here. 

These G*'s are, however, different from the bare response functions G-^ which are defined as a partial derivative of 
the price with respect to event flow (see Eqs. ()16p ). where all events except the one occuring at t are kept flxcd: 

/ dpt+e \ 
\ I ■ 

Following the logic of the previous calculation and reindexing the terms, one finds, within the linear model 



G^{1) 



(40) 



G.(£) = A«+ E 



(41) 



t<t'<t+e TTi ^ ' o<t'<e TTi 

What is the difference between Eq. (p9|) and Eq. (|4l]) ? In the former, we calculate the total price change until time 
t + £ due to the initial event, and this includes the adaptation of future event flow and of future gaps. In the latter, 
we only keep the possible jump due to this event and the adaptation of gaps, but not of the event flow, which is 
assumed to be fixed. This omission is indeed consistent with Eq. (|T6| . since the effect of event flow adaptation is 
already accounted for: the equation is based on the true event flow, and hence already includes the full correlation 
structure between events. 

When the tick is small, the gaps are allowed to fluctuate and adapt to the order flow. The extra impact contribution 
is therefore captured by the above term: 

SGlil) ^ YH [^^Mt) - K^Mt)] ■ (42) 

0<t<f iri 

Our final model defined by Eqs. (|30p and ([5^ amounts to adding this fluctuating gap contribution to the average 
realized gap in the bare propagator, i.e., 



G,(£) 



■SGU£). 



(43) 



The new second term describes the contribution of the gap "compressibility" to the impact of an event up to a time 
lag i, and it is shown in Fig. [TW right). Perhaps surprisingly, it appears that a small market order M0° "softens" the 
book for small ticks: the gaps tend to grow on average and SG^^o is positive. Price changing events on the other 
hand "harden" the book, for all stocks the contribution is negative. Queue fluctuations (CA° and LO") seem less 
important, but for small ticks these types of events also harden the book. For large ticks i5G*'s are found to be about 
two orders of magnitude smaller, which confirms that gap fluctuations can be neglected to a good approximation in 
that case. 
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Figure 17: (left) Final estimate of the total average price change G1{£) due to an event tt, based on Eq. (|39}, for small tick 
stocks, (right) Contribution of gap flexibility to the price change: SGZ{£) calculated from Eq. (|42p for small tick stocks. The 
curves are labeled according to n in the legend. 



8. CONCLUSIONS 



Previous studies have focused on the impact of market orders only and have concluded that this impact decays in 
such a way to offset the correlation of the sign of the trades. The underlying mechanism is that market orders on one 
side of the book attract compensating limit orders. These limit orders do not necessarily change the best limits, but 
are such that the conditional impact of a buy trade following other buy trades is smaller than the conditional impact 
of a sell trade following buy trades. As pointed out in Gerig IJ], the strength of this asymmetric liquidity effect is 
the dominant effect that mitigates persistent trends in prices. Our study confirms this finding: events happening on 
the same side of the book are long-range correlated, but the signed correlation function (that assigns an opposite sign 
to limit order and market order on the same side of the book) is short ranged, demonstrating the compensating effect 
alluded to above. 

This effect leads to a strong "dressing" of the bare impact of market orders by limit orders. In fact, by including - 
besides the market orders - all limit orders and cancellations at the bid/ask, the price becomes a pure jump process. 
Every price change, whatever its cause (news, information or noise), can be attributed to exactly one of these events. 
In a first approximation, the various event types lead to a constant jump size that equals the average price change 
they cause. This simple picture works very well for large tick stocks, where both the average impact of all event types, 
and the volatility, are quantitatively reproduced by a constant jump model. The situation is different for small tick 
stocks, where the history dependence of these otherwise permanent jumps becomes important. Note that the effect 
discussed here is related to, but different from the Lillo-Farmer model that connects the temporal decay of the dressed 
market order impact to the history dependent conditional impact of a new trade. Here, we are speaking of the history 
dependence in a framework where the impact of all events, not only of market orders, is already accounted for. 

Another important observation is that not only the jump sizes are history dependent, the events themselves also 
behave in an adaptive way. An event can induce further events that amplify or dampen its own effect. As it is well 
known, the arrival of excess buy market orders is shortly followed by additional sell limit orders, but this is just one 
manifestation of such adaptive dynamics. For example, the reverse process, i.e. market orders following an excess 
of limit orders are also present, albeit with some delay and a smaller intensity. Our description of these and similar 
mechanisms, also involving cancellations, is a generalization of the theory of market order price impact in the related 
literature. 

In sum, the dynamics of prices consists of three processes: instantaneous jumps due to events, events inducing further 
events and thereby affecting the future jump probabilities (described by the correlation between events), and events 
exerting pressure on the gaps behind the best price and thereby affecting the future jump sizes. By approximating 
this third effect with a linear regression process, we have written down an explicit model, Eq. ([50]) . that accounts 
very satisfactorily for most of our observations. We have shown how to calibrate such a model on empirical data 
using some auxiliary kernels K and K defined by Eqs. ((27|) and (|28|) . This way of extracting the bare propagator Gtt, 
motivated by the above decomposition, seems to be less prone to numerical errors than the "brute force" inversion 
method used in Sec. [S] 

The methods proposed in this work are rather simple and general, and can be adapted to measure the impact of 
any type of trade once a discrete categorization is adopted. One could for example subdivide the category M0° into 
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small volumes and large volumes, or look at the impact of different option trades on the underlying, etc. Here, we 
have established that the bare impact of market orders is clearly larger than that of limit orders and that the bare 
impact of price-changing events shows only partial decay on the time scale that we are able to probe (1000 events 
only corresponds to a few minutes). It would certainly be interesting to study the long time behavior of these bare 
impact functions, as well as to understand how these impact functions behave overnight. 

We hope to have provided here a consistent and complete framework to describe price fluctuations and impact 
at the finest possible scale. Our approach can be seen as a "microscopic" construction of VAR-likc models, with a 
clearly motivated regression structure. We believe that the interaction between market orders and limit orders, and 
the impact of these two types of orders, are crucial to understand the dynamics of the markets, the origin of volatility 
and the incipient instabilities that can arise when these counteracting forces are not on even keel. The interesting next 
step would be to analyze in detail these situations, where large liquidity fluctuations arise, and the above 'average' 
model breaks down. On a longer term, a worthwhile project is to construct a coarse-grained, continuous time model 
from the above microstructural bricks, and justify or reject the slew of models that have been proposed to describe 
financial time series (Levy processes, GARCH, multifractal random walk, etc.). 
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Appendix A: The dependence of the spread on the event flow 

In this appendix we show how the framework introduced in the main text can be used to study the dynamics of 
the bid-ask spread. For the spread St, one can write an exact formula very similar to Eq. \21\ : 

t<t'<t+e TT 

where Ajr = ±2A7r with the + sign for tt = MO', CA' and the — sign for tt = LO'. The other three A's are zero, just 
as the respective A's were. The above equation is accurate because our model includes all the possible events that 
can change the best quotes, and thus all the possible events that can change the spread. 

However, when formulating a permanent impact model for the spread dynamics in the same spirit as we did for 
the price, one should bear in mind that the spread is a mean-reverting quantity that oscillates around a mean value 
{S). In other words, the average value of St+i when £ — >■ c» is equal to (5), independently of the initial value St- 
Therefore: 

lim {St+e - St\St) = lim V V {A^,,t',e,J{7rt' = 7r2)\St) = {S) - St. (A2) 

t<t'<t+e ir2 
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Since the right hand side obviously depends on St, the conditional value (A^j^c^e^, /(TTf/ = 7r2)|S't) also has to. This 
means that the event flow and possibly the gaps are correlated with the spread, and they adjust such that the spread 
mean reverts. If this were not true, the spread would follow an unbounded random walk. To illustrate this, the spread 
dependence of realized gaps is shown in Fig. [THl 
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Figure 18: Realized gaps as a function of the spread (after removing the beginning and the end of the trading days) 



A related study by Ponzi et al. [20| based on a selection of stocks from LSE comes to a similar conclusion. They 
find that both realized gaps and event rates are functions of the spread. In particular limit orders are placed deeper 
in the spread when the spread is larger. In addition, they show that the rate of transactions decreases with larger 
spreads, while the rates of cancellations and incoming limit orders increase sharply. 

Such an adaptive behavior can be quantified through Eq. (|A1[) . for I ~ 1 this reads 

St+i - St = AMO'.e„tI{7Tt = MO') + AcA'. cutHT^t = CA') - |ALo',.„t| liTTt = LO'), (A3) 

where we took the negative absolute value of Alo' to emphasize that it is strictly negative, while all the other 
quantities in the equation are non-negative. The unconditional expectation value of the left hand side is zero, since 
(St+i) = (St)- Thus on average the spread-altering effect of market orders, cancellations and limit orders balances 
out. If for example St > (S), then the spread mush shrink back to its mean, so the left hand side must be negative. 
The only way for the right hand side to become negative as well, is if the spread-opening contribution of market 
orders/cancehations decreases and/or the spread-closing effect of limit orders increases. 

This is possible by the variation of both the gaps and the event rates, for our purposes it is enough to introduce a 
combined description the two. The simplest possible model of such adaptive dynamics is to assume that the conditional 
distribution of the random variable A^^ tli^^t = tt) given the current spread value St can be approximated by that 
of 



A^^Ii^t = t:) 



1 + ^{{S)^-St) 
A„ 



where /(7rt = tt) now follows its unconditional distribution, (5*)^ is the average value of the spread at the time of 
events of type tt, and a is a constant parameter characterizing the strength of mean-reversion. 

Even though the term in the brackets is understood to include contributions from both gap and rate adjustments, 
technically such a model only amounts to substituting 

A^^f,e,=Al^a{{S)^-St.), (A4) 

for the gap dynamics in Eq. (|Aip . and this makes analytical calculations possible. One finds that the modified spread 
behavior is such that 

St+, = St{l-af+ Y.^l-c^Y^'-''^'l{^t'^^)(Al + a{S)J, (A5) 

t<t'<t+e TT 

from which one deduces the spread response function: 

<(£) = {{St+e - St)I{7:t = TTi)) /Pin,) = 
[{S)-{S),^][l-{l-a)']+ E(l-«)^*^'"'^"*X^('^2)n.,,.,(i'-i). (A6) 

t<t'<t+i ir2 
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Figure 19: Comparison of true and approximated normalized spread response function R'^{£)/ P{n) for (left) large tick stocks 
and (right) small tick stocks for events that do not change the price. Symbols correspond to the true value, and lines to the 
approximation. The curves are labeled according to n in the legend. 
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Figure 20: Comparison of true and approximated normalized spread response function R^{£)/P(tv) for (left) large tick stocks and 
(right) small tick stocks for events that change the price. Symbols correspond to the true value, and lines to the approximation. 
The curves are labeled according to n in the legend. 



Eq. (|A6p tells us that the dynamics of the spread is related to the autocorrelation of (unsigned) event types, just 
as the response function was related to the signed event autocorrelation functions (except for the inclusion of a to 
describe adjustment to the event flow). 

To test this model on real data, in order to remove the effect of intraday periodicity and overnight effects, we will 
neglect the first 30 and last 40 minutes of trading days, and in all correlation functions we will only consider times 
when both events are within the same day. The spread response functions and their approximations without allowing 
for gap fluctuations (a = 0) are shown in Figs. 1191 and The constant gap approximation works well for large tick 
stocks, but for small tick stocks only for short times, up to Z « 10 — 30 events. This is in line with the findings of Sec. 
[6] for the response function of price. 

One finds that the discrepancy for small tick stocks has two origins. First, due to the intraday non-stationarity of 
the spread the relationship 



f^oo (A^2,,,_^,,t+£/(7rf = 7ri)/(7rt+^ = TTa)) 

no longer holds. After excluding the overnight contribution (when t and t + £ are in different days), the gaps in the 
denominator are no longer sampled from the first £ events of the day, where they are systematically larger than at 
the end of the day. Even after adjusting II's to have the correct asymptotic value, one needs to introduce a > to 
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Figure 21: Comparison of the spread response functions of AAPL for four different cases: Eq. (|A6|) with a = (red), Eq. (|A6|) 
with adjusted II's and a = (green), Eq. (|A6p with adjusted II's and a — 10~^ (blue), true response functions (black). Price 
in ticks. 

find an approximately correct shape of the spread response functions. The results for one example stock (AAPL) are 
shown in Fig. [2TJ 

Clearly this model is only intended as a first approximation, since it leads to an exponential decay of the spread 
autocorrelation function in constrast with the long memory found in the data, see Fig. 1221 It is possible to give a 
more complete description of the spread along the lines of Sec. 17.31 but we will leave this for future research. 
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Figure 22: Autocorrelation function of the spread (after removing the beginning and the end of the trading days). One can see 
that the exponential decay suggested by our simplified model captures the short-time dynamics, but not the slow decay lasting 
for thousands of events. 



Appendix B: Plots of various correlation functions 



In this appendix we show all the signed and some unsigned correlation functions, signed and unsigned, for small 
and large tick stocks separately, see Figs. [23ll30l 
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Figure 23: The normalized, signed event correlation functions C^-^,n2 (^) for large tick stocks, (left) tvi = MO", (right) tvi = MO'. 
The curves are labeled by their respective n2's in the legend. The bottom panels show the negative values. 
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Figure 24: The normalized, signed event correlation functions C-n-^ .,r2 (^) for large tick stocks, (left) tti = CA'', (right) tti = LO°. 
The curves are labeled by their respective 7r2's in the legend. The bottom panels show the negative values. 
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Figure 25: The normalized, signed event correlation functions C,ri,ir2(^) for large tick stocks, (left) tti = CA', (right) tti = LO'. 
The curves are labeled by their respective 7r2's in the legend. The bottom panels show the negative values. 
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Figure 26: The normalized, signed event correlation functions C,ri,7r2 (^) for small tick stocks, (left) tti — M0°, (right) tti = MO'. 
The curves are labeled by their respective 7r2's in the legend. The bottom panels show the negative values. 
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Figure 27: The normalized, signed event correlation functions Cni,Tr2{i) for small tick stocks, (left) tvi = CA'\ (right) tti = LO°. 
The curves correspond to the six possible values of 7r2's, see the legend of Fig. [221 for details. The bottom panels show the 
negative values. 
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Figure 28: The normalized, signed event correlation functions C7ri,7r2(^) for small tick stocks, (left) tti = CA', (right) tti = LO'. 
The curves correspond to the six possible values of 7r2's, see the legend of Fig. [26] for details. The bottom panels show the 
negative values. 
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Figure 29: The normalized, unsigned event correlation functions n^j.^2(^) for large tick stocks, (left) tvi = MO", (right) 
TTi — MO'. The curves are labeled by their respective 7r2's in the legend. The bottom panels show the negative values. 
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Figure 30: The normalized, unsigned event correlation functions TIt^-^^^^W for small tick stocks, (left) tti — MO", (right) 
TTi = MO'. The curves correspond to the six possible values of 7r2's, see the legend of Fig. [29] for details. The bottom panels 
show the negative values. 



